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ABSTRACT 



Aims. One of the most striking features predicted by standard models of galaxy formation is the presence of anti-correlations in the 
matter distribution on large enough scales (r > r^). Simple arguments show that the location of the length scale r^, marking the 
transition from positive to negative correlations, is the same for any class of objects as for the full matter distribution; i.e. it is invariant 
(-H ' under biasing. This scale is predicted by models to be at about the same distance of the scale signaling the baryonic acoustic oscillation 



scale rtco- 

Methods. We test these predictions in the newest SDSS galaxy samples where it is possible to measure correlations on ~ 100 Mpc/h 
O ' scales both in the main galaxy (MG) and in the luminous red galaxy (LRG) volume-limited samples. We determine, by using three 

^ ' different estimators, the redshift-space galaxy two-point correlation function. 

C/5 , Results. We find that, in several MG samples, the correlation function remains positive on scales > 250 Mpc/h, while it should 

, be negative beyond a 120 Mpc/h in the concordance LCDM. In other samples, the correlation function becomes negative on 

scales < 50 Mpc/h. To investigate the origin of these differences, we considered in detail the propagation of errors on the sample 
density into the estimation of the correlation function. We conclude that these are important at large enough separations and that 
they are responsible for the observed differences between different estimators and for the measured sample-to-sample variations in 
the correlation function. We show that in the LRG sample the scale corresponding to ri,„o cannot be detected because fluctuations 
in the density fields are too large in amplitude. Previous measurements in similar samples have underestimated volume-dependent 
systematic effects. 

Conclusions. We conclude that, in the newest SDSS samples, the large-scale behavior of the galaxy correlation function is affected 
by intrinsic errors and volume-dependent systematic effects that make the detection of correlations only an estimate of a lower limit 
■ of their amplitude, spatial extension, and statistical errors. We point out that these results represent an important challenge to LCDM 

' models as they largely differ from its predictions. 
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1. Introduction This former regime can be easily related to the early universe 

5J] coiTelation function by a simple rescaling of amplitudes given 

d Standard models of galaxy formation (i.e., cold, warm and hot by the linear gravitational growth of small amplitude perturba- 

dark matter models) predict the two-point correlation function tions in an expanding universe (Peebles, 1980). The scale r^. is 

^(r) of matter density fluctuations in the early universe, and they an imprint of the early universe physics. It corresponds to the 

can make a simple prediction for that at the present time, in the size of the Hubble horizon at the time of the equality between 

regime of weak density perturbations, where fluctuations have matter and radiation and it is fixed by flie values of standard 

been only linearly amplified by gr avitational clustering in the cosmological parameters being proportional to (Q/i^) ' where 

expanding universe (Peeblei,[l983)- The difference in the vari- Q is the density parameter and h the normalized Hubble con- 

ous models lying in the values of the characteristic length scales stant (|Peacocklll999l) . The third length scale r^,,, is located on 

and in the particular scale-behavior of ^(r). In general, this is scales on the order of, but smaller than, r^. This is the real-space 

characterized by three length scales and three diff'erent regimes, scale corresponding to the baryon acoustic oscillations (BAO) 

(i) on scales smaller than ro, where ^(kq) - 1, matter distribu- at the recombination epoch. Its precise location depends on the 
tion is characterized by strong clustering; i.e. ^(r) » 1, about matte r density parameters, bary on abundance and Hubble con- 
which little is known analytically and which is generally con- stant (lEisenstein and Hul \l99m . (iii) Finally in the third range 
strained by N-body simulations where it is typically fou i id tha t, of scales, namely for r > rc, ^(r) is cha racterized by a nega- 
for r < ro, ^(r) ~ with 7 =s 1.5 (ISpringel et all l2005l) . tive p ower-law behavior, i.e. ^(r) ~ -r''^ dGabrielli et al.Ll2002l 

(ii) The second length scale is such that ((rc ) - 0, and it is |2005 ). Positive and negative correlations are exactly balanced in 
located at r^ » ro (Peebles, 1993; Gabrielli etafl |2002|). In such a way that ({r)cPr = 0. This is a global condition on the 
the range of scales ro < r < r^, ^(r) is characterized by pos- system fluctuations, which corresponds that the matter distribu- 
itive correlations, which rapidly decay to zero when r — > r^. 
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tion being super-homogeneous dGabrielh et al.L l2002l l2005h i.e. 
characterized by a sort of stochastic order and by fluctuations 
that are depressed with respect to a purely uncorrected distri- 
bution of matter (i.e. white noise). This corresponds to the linear 
behavior of the matter power spectrum as a function of the wave- 
number k for — > (named the Harrison-Zeldovich tail), and it 
characterizes not only the LCDM model but all models of den- 
sity fluctuation s in the fram ework of the Friedmann-Robertson- 
Walker metric dGabrieUi et a l., 2002, 2005). 

In the new samples provid ed by the Sloan Digital S ky Survey 
Data Release 7 (SDSS-DR7) jAbazaiian et al.Ll2009l) . it is pos- 
sible to estimate the galaxy correlation function on scales on the 
orde r of 1 00 Mpc/h to possibly determine rhao and r^.. Some years 
ago, Eisenstein et akl feOOS) determined the iLandv and Szalavl 
(1199?)" (LS) estimator of the galaxy two-point correlation func- 
tion in a preliminary luminous red galaxy (LRG) sample of the 
SDSS, claiming for an overall agreement with the LCDM pre- 
diction and for a positive detection of the scale rj^n,, at about 
110 Mpc/h. More recentlv ICabre and Gaztanagal (2008) mea- 
sured the same estimator of the correlation function in the LRG- 
DR6 sample and Martinez et al. (2009) in the LRG-DR7 sample. 
They both found that the LRG correlation is positive up to 200 
Mpc/h and that the shape of the correlation fun ction around rhao 
is slig htly different from the one measured by lEisenstein et alj 
(l2005h . While they claimed that the measured correlation func- 
tion was compatible with the LCDM model, they did not discuss 
the fact that their detection implied that positive correlations ex- 
tend to scales larg er than t he mo d el predicted rg. In addition w e 
note that Eisenste in et all (l2005l) : ICabre and Gaztanagal (l2008h : 
iMartmez et alj (l2009h did not discussed other estimator than the 
LS one. 

In the present paper, we show that our results coincide very 
finely with the ones of the above mentioned papers for what con- 
cern the amplitude, shape and statistical error bars in the case of 
the LS estimator in the LRG-DR7 sample. However we measure 
that in the SDSS-DR7 main galaxy (MG) sample the two-point 
correlation function (LS estimator) remains positive at large sep- 
arations, i.e. for r > 250 Mpc/h, showing a clear systematic 
volume-dependent behavior and a remarkable disagreement with 
the LCDM prediction. In addit ion, we find that there is a dif- 
ference between the LS and the lDavis and Peebles! (Il983h (DP) 
estimator of the two-point correlation function in redshift space. 
Finally we find that both estimators significantly vary in differ- 
ent sky regions. We interpret these results by studying the fluc- 
tuations in the sample density estimation. 

The paper is organized as follows. We first define in Sect|2] 
the estimators of the correlation function and a simple determi- 
nation of its statistical errors that we use in the data analysis. 
Sect. [3] is devoted to the description of the samples selection 
while in Sect|4]we present our main results. The discussion of 
the behaviors we have found and their interpretation is presented 
in Sect|5] The behavior of the two-point correlation function pre- 
dicted by standard models of galaxy formation and the compari- 
son with the results obtained are discussed in Sect|6] Finally we 
draw our main conclusions in Sect|2l 



2. Pairwise estimators 

In what follows we determine the two-point c orrelation proper- 
ties by using the LS, DP and the Hamilton (H) (lHamilton[|l993h 
estimators. These estimators may have a number of systematic 
biases when correlations are long range as we discuss in Sect|5] 
Firstly, it is interesting to discuss their properties and consider 
their determinations. 



The LS estimator is defined as 

— - NANr - 1) DD(r) Nr-lDR{r) 

kLsir) 2 hi (1) 

Nd{Nd-l) RR{r) Nd RR{r) 

where DD{r), RR{r) and DR{r) are the number of data-data, 
random-random and data-random pairs, and A^^, Nd are the num- 
ber of random and data points (we use A^^ - K ■ Nd with K - 1> 
and we have checked that the results do not significantly depend 
on K as long as this is larger than unity). 

The DP estimator is defined as 
-— Nr DD(r) 

(2) 

and the H estimator can be written as 
NrNd DD{r)RR{r) 



- 1 . 



(3) 



(A,-l)(Arf-l) DRHr) 

In general, a statistical estimator Xy of the statistical quan- 
tity X in a finite sample V, to be a valid one, must satisfy the 
following limit condition 

lim Xv = {X) , 

where in brackets we denote the ensemble average (infinite vol- 
ume limit). A stronger condition is that 

{Xv) - {X) , 

i.e. that the ensemble average in a finite volume is equal to 
the ensemble average in the infinite volume limit. If t his condi- 
tion is not satisfied the estimator is said to be biased dKerscherl 
Il999; Gabrielli et al., 2005). One wants to understand the bias 
and the variance of the various estimators and this is possi- 
ble only for some specific estimators and for distributions with 
simple correlation properties (e.g. Poisson). The effect of bias, 
i.e. finite volume or size eff'ects, can be studied through the 
analysis of artificial simulations with known properties; how- 
ever the three estimators defined above are all b iased ( KerscheJ 
ll999l:lKerscher et al.Ll2000HSvlos Labini and Vasilvev. ,2008) . It 
is worth noticing that iKerscheii d 19991) showed that, in a real 
galaxy sample, the three different estimators defined above use 
different finite size corrections yielding to different results on 
large enough scales, for small value of the correlation amplitude, 
while all of them agree on smaller scales, where the amplitude 
of the correlation was large enough. 

It was shown ( Landv and Szalavl [1993) that the LS estima- 
tor has the minimal variance for a Poisson distribution, i.e. the 
variance decays as 1 jN instead as 1 / Va as for the DP estimator 
This fact, however, does not mean that its variance will be any 
more controllable for a wider class of distribut ions with more 
complex correlation properties than Poisson's (Ga brielli et all 
2005). Indeed, there is no formal proof that the DP is less accu- 
rate than the LS for a generally correlated point d istribution even 
though this conclusion has been reached by, e.g.. lKerscher et all 
(2000) examining some specific properties of estimators in 
Nbody simulations. They co ncluded also that the H estimator i s 
equivalent to the LS one. In ISvlos Labini and VasilvevI (l2008l) . 
by studying finite volume effects in the estimators, it was shown 
that the two estimators LS and the H are indeed indistinguish- 
able, but that they are almost equivalent to the DP when the un- 
derlying distribution is positively correlated. 

Among the variou s ways to compute statistical errors 
dSvlos Labini and Vasilv ev. 2008 ) we use the ja ck-knife (JK) es- 
timate whose variance is (IScranton et al.L |2002|) 



(4) 
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where the index / is used to signify that the value of the correla- 
tion function is computed each time in all the sub-samples of 
a given samples but one (the /'''). 



3. The samples 

We have constructed several sub-samples of the main-galaxy 
(MG) and the luminous-red-galaxy (LRG) samples of the spec- 
troscopic catalog SDSS-DR7. Concerning the latter we have 
constrained the flags indicating the type of object to select only 
the galaxies from the MG sample. We then consider galaxies 
in the redshift range 10 < z < 0.3 with redshift confidence 
Zconf ^ 0.35 and with flags indicating no significant redshift de- 
termination errors. In additio n we apply the appa rent magnitude 
filtering condition r < \1 .11 jStrauss et al.Ll2002l) . 

The angular region we consider is limited, in the SDSS in- 
ternal angular coordinates, by -33.5° < ?/ < 36.0° and -48.0° < 
A < 51.5°: the resulting solid angle is Q = 1.85 steradians. We 
do not use corrections for the redshift completeness mask or for 
fiber collision effects. Fiber collisions in general do not present 
a problem for meas urements of large scale galaxy correlations 
jStrauss et al.Ll2002l) . Completeness varies most near the current 
survey edges which are excluded in our samples. The complete- 
ness mask takes into account that the fraction of observed galax- 
ies is not the same in all the fields, because of both fiber collision 
effects and small variation in limiting magnitude. One can, un- 
der certain assumption, take into account the completeness mask 
information in the statistical analysis. Otherwise it is possible to 
make tests by varying the limits in apparent magnitude and study 
the stability of the results obtained. We have applied this second 
possibility and we did not find sensible var iations in the mea- 
sured statistical properties when r < 17.5 (ISvlos Labini et all 
r2009dl) . This conclusion is confirmed by the fa ct that our result s 
for the LRG sample a gree with t hose of Eisenstein et al. I (l2005l) : 
ICabre and Gaztafiagal (2008); M artinez et al.1 (l2009l) and for flie 
MG sample with those of Z ehavi et al.l ( 2005allb[) . who have ex- 
plicitly taken into account the completeness mask of the sur vey 
in their analysis. As noticed bv Cabre and Gaztanaga ' (2008) the 
completeness mask could be the main source of systematic ef- 
fects on small scale only, while we are interested on the correla- 
tion function on relatively large separations. 

To construct volume-limited (VL) samples (see Tab. [TJ we 
computed the metric distances using the standard cosmologi- 
cal parameters, i.e., Q.m - 0.3 and - 0.7 with Hq - lOO/z 
km/sec/Mpc. We computed absolute magnitudes using Petrosian 
apparent magnitudes in the r filter corrected for Galactic absorp- 
tion. 

We checked that the main results in the MG sample we got 
do not depend on K-corrections a nd/or evolutionary corrections 
as those used by iBlanton et al.l ( l2003h . In this paper we use 
standard K-co rrection from the VAGC dataQ (see discussion in 
ISvlos Labini et ak (2009d) for more details). 

Concerning the LRG we have selected all the objects that 
have classification "galaxy" and which belong to the "Cut I" 
subset of the Galaxy Red objects with the same redshift quality 
criteria as for main galaxies. As this is only roughly VL sam- 
ple we have applied cuts in absolute magnitude M and distance 
R to obtain a rectangular area in the M - R diagram. In ad- 
dition because ev olutionary effects are small for LRG galaxies 
jEisenstein et al.L 12005) we have not applied further corrections 
to these data. Given that we have selected a truly VL sample, we 
did not apply a further redshift dependent weighting to the data. 
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VLl 


50 


200 


-18.9 


-21.1 


72037 


VL2 


150 


500 


-21.1 


-22.4 


69999 


VLB 


200 


600 


-21.5 


-22.7 


42357 


VL4 


70 


450 


-20.8 


-21.8 


93821 


LRG 


570 


1035 


-20.5 


-22.5 


53066 



Table 1. Properties of the SDSS-DR7 VL samples: /?,„,„, Ri„ax (in 
Mpc/h) are the metric distance limits; M„„„, M^ax the absolute 
magnitude limits in the r filter; is the number of galaxies. 
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Fig. 1. Correlation function in the VLl sample: both the LS and 
the DP estimators are reported. The solid fine gives predic tion of 
the LCDM with Q,,,/!^ = 0.12 (from 'Eisens tein et al.l (12005)) Un- 
early rescaled, according to the simplest biasing scheme (Kaise^ 
1984 ). to fit the amplitude on 10 Mpc/h. In the insert panel we 
show the same behavior but in a log-linear scale. 



The sub-samples used to measure the JK errors are made by 
dividing the survey angular region we considered into 30 sub- 
fields, each of area ~ 200 deg^. In this way there are some thou- 
sands galaxies in each sub-sample. 

4. Results 



' |http : // sdss ■ physics . nyu . edu/vagcT] 



We find, in agreeme nt with IZehavi et all (l2002l l2005al) ; 
lEisenstein et al.l (l2005h in previous data releases of the SDSS, 
that the redshift-space correlation function in different samples 
shows a different amplitude but similar shape on small scales 
(see Figsinili. This is usually ascribed to the (physical) effect of 
selec tion, that brig hter galaxies exhibit a larger clustering ampli- 
tude (IZehavi et al.l 2002, 2005a; Norberg et al., 2002). However 
this is not the only change: the larger the correlation function 
amplitude the more extended is the range of scales where there 
are detectable (i.e signal larger than JK) positive correlations. 
Indeed, in the MG samples the transition scale from positive to 
negative correlations occurs at a scale that grows roughly in pro- 
portion to the sample size and in the deepest samples this is lo- 
cated on rather larger scales, i.e. r > 250 Mpc/h. However in the 
VLl sample we find ^ 50 Mpc/h, i.e. less than the half of the 
LCDM prediction. 

To show that finite-volume effects are important on large sep- 
arations, we consider a single sample (VL4) and we cut it at 
different scales Rmax', in addition we consider an angular cut of 
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Fig. 2. The same of FiglUbut for the VL2 sample. 
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Fig. 3. The same of FiglT]but for the VL3 sample. 
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Fig. 4. The same of FiglUbut for the LRG sample. 



Fig. 5. Correlation function in the whole sample VL4 and in a 
sub-sample of it (VL4c) limited at Rmax - 250 Mpc/h. Jack- 
knife errors are shown in both cases. 




r(Mpc/li) 

Fig. 6. Correlation function measured through the LS estima- 
tor (with jack-knife errors) in the LRG sub-sample (Rl) which 
is limited by -33.5° < 77 < 36.0° and -48.0° < A < Q\ i.e. 
with solid angle Q = 0.9 steradians. The solid line is the LCDM 
prediction. 



the LRG sample for which the depth is fixed but the volume is 
lowered. In the latter case the whole angular region of ~ 6000 
deg^ is cut into two non-overlapping sky region, each of area 
~ 3 000 deg^, i.e only ~ 2 0% smaller than the sample considered 
by Eisenstein et al.l (l2005b . As one may notice from Figs|5]|2l 
there is a clear volume dependence of the two-point correlation 
function on large scales. In particular, in the Rl sub-sample there 
is an evident difference between the data and the LCDM predic- 
tion. In addition, we note that almost in all cases the DP and LS 
estimator on large enough scales show a difference which can be 
larger than statistical error bars. 

It is worth noticing that our result for the LS estimator of the 
correlation function in the LRG sample finely agrees with the 
determination of Martinez et al. (2009), although these authors 
have used a slightly different technique to take into account the 
survey completeness mask, as we commented above (see Fig|8]l. 
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Fig. 7. The same of Fig|6]but for the R2 angular region, which 
is hmited by -33.5° < tj < 36.0° and 0° < A < 48°, i.e. with 
soHd angle Q - 0.9 steradians. 
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Fig. 8. Determination of the correlation function for the LRG 
sample with th e LS estimator (LRG ), compared w i th the 
lEisenstein et"ai] (l2005b (EOS) and the iMartmez et all (l2009l) 
(Martinez) determinations. The solid line is the LCDM predic- 
tion. 



The LS estimator for the LRG sample is also v ery similar to 
the determination made by lEisenstein et al] (l2005l) . although the 
signal is larger then the statistical error bars and p ositive up to 
200 Mpc/h, as it was found also bv lMartmez et al.l f2009) in the 
same sample we considered. A simi l ar tren d was also seen in 
the analysis bv lCabre and Gaztanagal (l2008h . In addition our re- 
sult for the MG sam ple nicely agree with the determination of 
IZehavi et al.l (l2005bl) . although they did limit their analysis to 
smaller scales t han the ones considered in our analysis. 

We note that lEisenstein et al.i (12005 ) stated that the MG sam- 
ple does not have an enough large volume to measure the correla- 
tion function on 100 Mpc/h scales, without giving a clear quan- 
titative argument of why statistical or systematic errors should 
prevent one to measure the correlation function on those scales. 
Indeed, we find that the signal to noise ratio, when JK error esti- 
mations are used, is larger than unity even on scales larger than 
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Fig. 9. Behavior of the ratio ^Ls{r)I^H{r) as a function of sepa- 
ration in the different MG and LRG samples. 



150 Mpc/h. In this respect one may ask whether statistical errors 
computed in this way are meaningful. 

In addition we note that iMartmez et al.l ( l2009l) also found 
that the coiTelation function becomes negative on scales of the 
order 50 Mpc/h in a 2dFGRS sample, without however com- 
menting on this fact. Actually they even claimed that ri,ao is de- 
tectable when the correlation function is negative, without dis- 
cussing that this is not what one expects in the context of the 
LCDM model where the zero point of the correlation function 
must be a single scale for any type of objects (see below). 

Finally we find that, as discussed in Sect.2, the LS and the H 
estimators of the coiTelation function are almost indistinguish- 
able: this is shown in Fig|9]where we plot the behavior of the ra- 
tio ^is (y)I^H(y) as a function of separation. This remains smaller 
than ~ 5% on all the relevant scales. 

Finally to check the number of points A^^ used in the random 
sample do not alter the estimation we have increased A^^ up to ten 
times the number of Nd without detecting any sensible change. 
We conclude that the difference between the DP and LS estima- 
tor lies in the bias (finite-volume effect) intrinsic to the different 
ways these estimators take into account boundary conditions. 



5. Fluctuations and volume-dependent systematic 
effects 

In theoretical models, the matter density field is uniform on large 
scales and the average mass density («) is provided by an average 
over an ensemble of realizations of a given stochastic process. 
In a finite sample of volume V, the average density n can be 
estimated in some way. In the limit in wh ich the sample volume 
is infinite and in the process is ergodic dGabrielh et al.L l2005h 
then limv^oo n = (n) because in this limit the relative variance 
goes to zero if the distribution is uniform on large scales, i.e. 



lim a^(y) - lim 

V— *oo V~>oc 



AA(y)2 



N{V) 



(5) 



where NiV) is the mass in a volume V. In a finite volume <t^{V) 
is finite and therefore in any finite volume n + («). In general 
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for a uniform stochastic point process, in the ensemble average 
sense the relative mass variance can be written as 

cr2(y) ^ J- f f P{ri - r2)d\id^r2 + — ^- — (6) 
JvJv (NiV)) 

where ^(r) is the ensemble average two-point correlation func- 
tion. In the r.h.s. of Eq|5] there is the sum of the contribution 
to the variance due to correlation and due to Poisson noise, the 
former being always present in a point distribution. 

Thus in a finite sample any determination of the average den- 
sity n has an intrinsic error o-{V). Given that the two-point cor- 
relation determines the amplitude of correlations with respect to 
the sample density, it is natural to ask which is the error intro- 
duced in the estimation of the correlation function by the uncer- 
tainty on the value of the sample density. A second question is 
which kind of statistical estimation of the correlation function 
errors in a finite sample is representative of the errors induced 
by the average density uncertainty. 



We show now that the different values the sample density 
may result in a different measurement of the large scales behav- 
ior of the correlation function. To this aim, let us assume that 
there is a small difference between the value of the sample den- 
sity used by the estimator 1 and the estimator 2, so that we can 
write 

T^i^Wid+S) (9) 

with 6 <s: 1 . Le us also suppose that the two estimators measure 
the exactly same conditional density «p(r). This is a simplifying 
but reasonable assumption as the conditional density is averaged 
over many points placed in different parts of the sample volume. 
In these conditions we may write that 



n„(r) 

fi,2(r) = ^ - 1 

"1,2 

and thus from Eqs|7]|9]we get 



5.1. Fluctuations in the determination of ttie sampie density 
The two-point correlation function is defined as 
<«(r)«(0)) ^ {np(r)} 



where 



(n)2 



- 1 = 



<«> 



- 1 



(7) 



(npir)) = 



{njrMQ)) 
(n) 



is the conditional density. Because of the definition in Eq|7] any 
estimator of ^(r) can be written as 



^(r) = - 1 



(8) 



where ripir) is the sample estimation of the conditional density 
and n7 is the sample estimation of density. Note that, in general, 
to measure the conditional density, one performs an average over 
all points in the sample (Gabrielli et al., 2005). On the other hand 
the estimation of the sample average does not involve the aver- 
age operation. For instance one can simply determine the sample 
density to be n7 = ^ where V is the sample volume and is the 
number of objects in it. 

In addition it is worth noticing that the pair-wise estimators 
introduced in Sect|2] necessarily use a similar strategy, as in or- 
der to the measure the average of the sample density one would 
need many samples of size V. Thus, the determination of the 
two-point correlation function requires the estimation of an av- 
erage quantity and of a non-average quantity. The former can 
introduce volume-dependent systematic effects in a non-trivial 
way. 

Suppose that a certain estimator ^i(r) of the two-point cor- 
relation function uses the sample estimation «T while another 
estimator ^lir) uses Tvz: the difference between nT and is not 
due to the fact that the samples are different, rather that the dif- 
ferent estimators use different boundary conditions to measure 
the two-point correlation function, i.e. different ways of normal- 
izing the data-data pairs to the data-random an d random-random 
pairs. Thus they are subject to a different bias (;Kerscheitll999l) . 
Alternatively one can think to measure the same estimator but 
into two different samples of same geometry and volume, in 
which the sample density takes a slightly different value. 



— — 1 — — 

^lir) = — 1 ~^i(r)-(J, 

L + 



(11) 



which makes explicit that a different determination of the sample 
density results in a variation of the estimated two-point correla- 
tion function. 

As an illustrative example, we can take as (r) the LS esti- 
mator for the LRG sample. We find that, for 6 = -0.006, ^2{f) in 
Eqdl] almost perfectly agrees with the DP estimator in the same 
sample (see FigfTOb. It is thus clear than a small uncertainty in 
the value of the sample average (in this case 0.6 %) can affect 
the large scale behavior of the correlation function in the range 
of scales and of amplitudes of interest, i.e. around 100 Mpc/h in 
the LRG sample. Therefore, we have to determine what is the 
error on the estimation of the sample density and then we have 
to clarify how this changes the large scale behavior of the cor- 
relation function. Is the above estimation of 0.6% representative 
of the true uncertainty on the large scale average density ? 

Simply stated, the problem is the following: in order to mea- 
sure the BAO we need to have an error of about 10"^ on the 
estimator of the correlation function. Indeed, for the LRG case, 
the correlation function on 100 Mpc/h has an amplitude of about 
10 2 while the feature corresponding to the BAO (a slight local 
increase followed by a decrease) corresponds to a local variation 
of about lO^^* in the correlation function amplitude. 

By errors propagation, we find from Eql8]that 



__ 6np(r) np{r)_ 
O^v) ^ + cr . 



(12) 



We neglect again the statistical error on the determination of the 
conditional density on scales smaller than the sample, i.e. the 
first term in the r.h.s. of EqlT2l As discussed above, this approxi- 
mation is reasonable in view of the fact that the conditional den- 
sity is determined by making an average over many points. Then 
by using again Eql8]we can rewrite the previous equation as 



5^{r) - (^(r) + 



(13) 



where we used that ^(r) «: 1 as this is the regime in which we 
are interested in. From EqlT3l it follows that the error on the 
correlation function estimation is of the same order of the error 
in the estimation of the sample density. Therefore the question is 
whether we really know the sample density with an error of the 
order of 10"^. 
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Fig. 10. By taking the LS estimator in the LRG sample for (r) 
in Eq[TT]we find that for 6 - -0.006 the quantity ^2if) (labeled 
as LS_c) almost perfectly agrees with the DP estimator in the 
same sample. 



The typical fluctuation on the density estimation in a given 
sample, on scales of the order of the sample size, is cr. The prob- 
lem is to constrain a from the data. As mentioned above it is not 
possible to make an average over many samples of volume V, as 
we have a single one, and thus we can determine the fluctuation 
only inside the sample itself by considering several sub-samples 
of it. 

We have estimated cr on the relevant scales as follows. We 
divide the sample into independent (non-overlapping) angular 
fields and then we determine the number of galaxies in the each 
field. We then compute the average A^ and the variance IP- and 
thus the standard deviation as 



N 



(14) 



As there is an arbitrariness in the choice of the number of fields 
A' we let it to vary between a few, for which we have more than 
10"* objects in each field, to some tens, to have a least several 
hundreds of galaxies in each field. 

From FiglTT]we may note that in the LRG sample, the typical 
fluctuation is about 8% for about any value of N and that this is 
much larger than Poisson noise, i.e. almost a factor 100 larger 
than the error needed to measure the correlation function with a 
precision of the order of 10"^ ! 

Note that this value of the typical fluctuation is in agree- 
ment w ith that obtained in a smaller LRG sample bv lHogg et alj 
(|2004|) . For the MG samples we find that has about the same 
amplitude as for the LRG case (see FigfTSb. Thus given that 
a < 0.1 we conclude that we can get in these samples a sta- 
tistically significant estimation of the correlation function only 
for |^(r)| > \S^{r)\ x cr x OA and thus any claim about smaller 
amplitude is biased by overall volume-dependent systematic ef- 
fects. This implies that, for the LRG sample, our estimation is 
statistically significant for r < 50 Mpc/h. To measure correla- 
tions of smaller amplitude, and thus on larger scales, we need 
to have samples in which the typical fluctuation of the average 
density is, at least, a factor ten smaller than the present one. 

Note that for the case of MG samples, and specifically for 
VL2 and VL3, the amplitude of the correlation functions is of 



Fig. 11. The typical fluctuation in the LRG sample average 
density is about 8% for about any value of N in the range 4, 30 
and it is much larger than Poisson noise. 
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Fig. 12. The same of FigHUbut for the VLl, VL2 and VL3 case. 



the order of cr up to 250 Mpc/h. We stress however that one 
should also care about whether the property of self-averaging is 
satisfied in these samples, and thus whether the determination 
of average quantities gives a meaningful estimation of intrinsic 
properties (Sylos Labini et al., 2009c dj). 

While the above argument about error propagation strictly 
applies when we determine the correlation function by consid- 
ering Eq[8] we show in what follows that the above estimation 
holds also in the case of the DP and LS estimators. To show this, 
let us now compute statistical error bars in different way than by 
the JK method. 



5.2. Statistical errors 

The errors on the correlation function can be determined in vari- 
ous manners and the problem is to understand, in the case of the 
actual distribution, which methods gives the most reliable error 
estimation. To this aim, let us consider in more detail the com- 
putation of JK errors: in practice one takes almost fixed the sam- 
ple density and computes the typical variation with respect to it. 
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Indeed, we remind that each of the sub-fields used to in the 
JK estimation is equal the full sample without a small sub-field 
of angular area equal to 1 jN of the full sample area. Therefore 
the different sub-fields are strongly overlapping: in the case in 
which large scale correlations are not negligible this method un- 
derestimates the errors in the correlation function estimation. 

We find that in the LRG sample the variation of the sample 
density in the = 30 sub-fields used to compute the JK errors 
is smaller, i.e. a ^ 5 ■ 10"^ than what is estimated by computing 
the variance in non-overlapping sub-fields. This result does not 
show a particular dependence on the number of sub-fields used 
as long as N > 10. 

Field-to-field errors can quantify, volume-dependent system- 
atic efifects due to large-scale variation of the sample density. 
They can be computed by dividing the sample into non- 
overlapping sub-fields. The correlation function can be esti- 
mated by 



i=l 

and then the variance is 



N-l 



(15) 



(16) 



In FiglT3] we show the behavior of the errors, in the LRG 
sample, computed by Eq|4]and EqlT6] and considering 10, 20 
and 30 fields. One may note that (i) the field-to-field error in 
larger than the signal for r > 50 Mpc/h, i.e. for scales larger the 
amplitude of the estimated correlation function is ^(r) ~ o^. (ii) 
The field-to-field error is larger than the JK error on all scales 
by about five times. Note that the JK erro rs are similar to those 
derived by of lCabre and Gaztanagal (l2008h . The field-to-field er- 
rors are much larger, and they could be over-estimates because 
the fields used are smaller than the full sample. To check whether 
this is the case we can vary the number of sub-fields used to es- 
timate the field-to-field fluctuations as we did, for instance, to 
compute the typical rms fluctuation on the average density (see 
Figs fTTIfTSl) . Clearly by reducing the number of fields one has 
less determinations, while increasing A^ one is finally dominated 
by shot noise. For A^ in the range [10,30] we do not notice any 
clear decrease in the field-to-field errors. Our conclusion is there- 
fore that the JK error is not the complete error but only the sam- 
pling error while the field-to-field fluctuations include the possi- 
ble fluctuations due to the uncertainty on the sample density es- 
timation and it and should be larger or equal than JK errors. An 
additional problem we consider in the next section, is whether 
the statistical errors measured by considering non-overlapping 
fields are able to take into account the whole uncertainty on the 
sample average, i.e. they can take into account the bias of the 
estimators. 

Note that the behavior of the correlation function in the MG 
VL2 and VL3 samples on large enough scales, i.e. r ^ 200 
Mpc/h, is the same when considering both JK and field-to-field 
errors, showing thus that there are positive correlations on scales 
larger than the cut-off" on ~ 120 Mpc/h predicted by the 
LCDM model without a statistical robust evidence of the rhao 
scale on » 110 Mpc/h. 

5.3. Large scale volume-dependent systematic effects 

The simple estimation of field-to-field errors allows one to over- 
come the problem related to the JK method, in which the im- 
plicit assumption is that correlations on the scale of the sample 




r (Mpc/h) 



Fig. 13. Jack-knife errors, and field-to-field errors computed 
with diff'erent number of fields N= 10,20,30 in the LRG sample. 
The solid line corresponds to the full-sample determination of 
the LS estimator 



are negligible. However the field-to-field method is not able to 
take into account the full errors on the correlation function es- 
timation. This is because the sample density is systematically 
diff'erent from the ensemble average density when the correla- 
tion function is non zero at large scales. This introduces a well- 
known bias, i.e. a volume-dependent systematic eff'ects. Let us 
discuss this further effect. 

Most of the literature on the correlation function mea- 
surements has focused o n the determ i nation of the s tatisti- 
cal errors dZehavi et al .1, '20021 l2005at iNorbergetall 120021: 
lEisenstein et al.L 120051: Norberg et all l2009h while httle atten- 
tion has been devoted to the understanding of the distortions in- 
troduced by volume-dependent systematic effects. These depend 
on the precise type of estimator used, but they affect any estima- 



tor ^(r; V), in a finite sample of volume V, in som e ways at large 
enough scales (ISylos Labini and Vasilvevll2008h . 

For instance, an important volume-dependen t systematic ef- 
fect is related to the so-called integral constraint jPeebleslI 19801) 
and can be understood as follows. The estimator ^(r; V) mea- 
sures amplitude and shape of conditional correlations normal- 
ized to the estimation of the sample mean instead to the "true" 
(ensemble or infinite volume li mit average) average density 
dSvlos Labini and Vasilvevil2008l) . As long as the "true" correla- 
tion function is different from zero (e.g. in case of LCDM on all 
scales) any estimation of the average density in a finite sample 
diff'ers from the "true" value. This situation introduces a system- 
atic distortion of ^(r; V) with respect to ^(r) which, depending 
on the correlation properties of the underlying distribution, is 
manifested in (i) an overall diff'ere nce in amplitude and (ii) a 
distor tion of the shape for r < V^^^ (ISylos Labini and Vasilyevl 
[20081) . 

In order words, only if the zero point of the correlation func- 
tion is due to the boundary condition corresponding to the in- 
tegral constraint, then this will be diff'erent for different sample 
sizes. If the zero-point is real, as it should be in a LCDM model, 
then it should not change from the sample to sample. 

The definition of the range of scale in which this former ef- 
fect occurs, depends on the precise estimat or used. For instance , 
in the case of the full-shefl (FS) estimator dGabrieUi et all 120051: 
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ISvlos Labini and Vasilvevll2008l) and for a spherical sample vol- 
ume, our ignorance of the "true" average density value is explic- 
itly present in the condition that Jy^ir, V)ir)d^r - 0, where the 
integral is performed over the whole sample volume V. Note that 
this condition holds for any V and it forces the estimator to be- 
come negative even if the "true" ^(r) is always positive inside 
the given sample. The effect of this boundary condition is the 
following: as long as the "true" coiTelation function is positive, 
by enlarging the v olume size the change of sign occ urs at larger 
and larger scales (ISylos Labini and Vasilyevl l2008h . This effect 
may very well explain the behavior found in the MG VL samples 
discussed above, in which we noticed that the transition scale 
changes from r » 50 Mpc/h for the smallest sample to more than 
250 Mpc/h for the deepest sample we considered. Note that if 
the "true" correlation function is neg ative, then the d istortion on 
large scales can be rather important CSvlos Labini and Vasi lvev, 
l2008h . 

While for the FS estimator one can analytically calculate the 
scale at which the systematic departure from the "true" shape oc- 
curs, for more complex estimators based on pair-counting, like 
the LS one, it is possible to understand only through numerical 
simulations the ways in which this boundary condition affects 
the measured correlations. This is the complication to be consid- 
ered having the advantage that these estimators can measure cor- 
rela tions on scales larger than those sampled by the FS estima- 
tor jSvlos Labini and Vasilyevl l2008l). For pair-counting estima- 
tors i t has been numerically shown dSvlos Labini and Vasilyevl 
12008 ') that, when fluctuations in the sample density are small 
enough, r oc y'/^ the pre-factor of this proportionality depends 
on the type of estimator and on the sample geometry. However, 
we note that large scale fluctuations may alter this systematic be- 
havior as a function of the s ample vo lume in a non trivial way 
(see e.g.,'Sylos Labini et al.'('2009a'b.c"d')). 

Note that the simple computation of how the error in the av- 
erage density propagates into the error on the coiTelation func- 
tion does not take explicitly into account of the situation in 
which the sample density itself can be a varying function of 
the sa mple size (the interested reader to (ISvlos Labini et all 
l2009adlm for a more complete discussion of this important 
point). Indeed, as mentioned above, the estimated sample av- 
erage converges to the asymptotic average density with a rate 
determined from the decaying of the two-point correlation func- 
tion. When correlations are strong, there can be an important 
finite-volume dependence of the sample density, resulting in a 
similar finite-size effects of the t wo-point correlation function 
(ISvlos Labini and Vasilyevl l2008l) . 



6. Theoretical implications 

To theoretically interpret these results it is necessary to take into 
account an important complication which changes the predic- 
tions of standard models described in the introduction. Indeed, 
these refer to the whole matter density field (dark and luminous) 
while we observe only a part of it in the form of luminous mat- 
ter (i.e. galaxies). The relation between galaxy and dark mat- 
ter distributions is usually formulated in terms of bias: the latter 
represent a certain (physical) sampling of the former. There are 
two different relevant regimes. At non-linear scales, where the 
distribution has strong clustering characterized by non Gaussian 
fluctuations, this relation can be studied only through numeri- 
cal models (Springel et al., 2005; Crotonetal., 2006). Instead, 
on scales where perturbations are small and clustering is in the 
linear regime, there is a simple picture based on the threshold 



sampling of a Gaussian random field dKaiserl [19841) . In the for- 
mer case one may derive analytically that the "biased" two-point 
correlat i on fu nction is linearly amplified by threshold sampling 
(iKaiseA Il984j) . This is found to occur also in the non-linear 
regime but under diff'erent condit i ons, as shown by numeric al N- 
body simulations jSpringel et all 120051; ICroton et al.Ll2006l) : the 
effect of biasing is to linearly amplify the correlation function, 
while the simple threshold sampling of a Gaussian random field 
predicts a strongly scale dependent a mpUfication of the corr ela- 
tion function in the non-linear regime (lGabrienietal.LfT999h . 

Therefore the prediction of the non-linearity scale tq for the 
full matter distribution (which, in current models, is ro ~ 8 
Mpc/h) gives only an approximate estimate for that of galaxies of 
different luminosity. Indeed this scale has been f ound to slightly 
vary in N-body simulations dSpringel et al.L 1200 5). On the other 
hand the scale is not affected by biasing for the simple reason 
that it is located, in current models, on about x 120 Mpc/h 
where fluctuations have low amplitude and thus where both bias- 
ing and gravitational clustering give rise to a linear amplification 
of the correlation function. Hence, given that for r > there are 
not positive correlations in the whole matter density field, these 
will not be present in the galaxy distribution as they cannot be 
generated by a biasing mechanism. Thus the length scale is 
invariant with respect to biasing, i.e. it must be the same for 
any class of objects as for the whole matter density field. It is 
then a fundamental scale to be measured in the observed galaxy 
distribution to verify the class of models characterized by the 
Harrison-Zeldovich tail of the matter power spectrum. Finally 
the third length scale in cuiTent models is the BAO scale, located 
at rbao ~ fc, and it is weakly affected by gravitational evolution 
and biasing (Eis enstein and Hu, 1998). 

That the scales bao and are invariant under biasing is 
shown by the ana lysis of the N-bod y simulations provided by the 
Horizon project dKim et al.l l2009l) where it is found that these 
are the same for the whole matter distribution and for the sub- 
sample of particles corresponding to the LRG (see their Fig. 5). 

7. Conclusions 

In the newest SDSS samples it is possible to measure the coiTe- 
lation function on ~ 100 Mpc/h scales both in the Main Galaxy 
(MG) and in the Luminous Red Galaxy (LRG) samples. We 
measured, in the former case, positive correlations extending up 
to a factor two beyond the scale rc ~ 120 Mpc/h, at which in the 
LCDM model ^(r) should cross zero being negative on larger 
scales. However in nearby samples we measured that positive 
correlations are detectable only up to ~ 50 Mpc/h. Therefore we 
concluded that in these samples The correlation function shows 
a rather different behavior from the LCDM model prediction and 
that there is no statistical significant evidence for the scale corre- 
sponding to the baryonic acoustic oscillations (BAO). Moreover 
we found that the estimated two-point correlation function in dif- 
ferent MG VL samples shows a clear dependence on the sample 
volume. We concluded that the overall errors in the estimation 
of the correlation function cannot be simply evaluated by the 
computation of statistical error bars (e.g. JK) but they can only 
be studied by making systematic tests in samples with different 
volumes. 

In addition, we have shown that, in the LRG sample, the 
uncertainty on the sample density estimation does not allow to 
measure the correlation function on scales of the order of ~ 100 
Mpc/h. Rather it puts a upper limit to the estimation of correla- 
tions at about ~ 50 Mpc/h. More specifically the fluctuation on 
the estimation of the sample density for the LRG sample is of the 
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order of 8%. This is, as we have discussed, of the same order of 
the errors in the correlation function. We have pointed out that in 
order to measure the small bump in the correlation function as- 
sociated with the BAO scale, one would need samples in which 
the fluctuation on the estimated density is more ten times lower 
than the value found in the LRG-DR7 sample. 

For this reason we concluded that in the LRG sample there 
is no statistical evidence for the BA O and that previous mea- 
surements (jE isenstein et al., 2005; C abre and Gaztanagal l2008t 
iMartmez et al.i, ,2009.) have underestimated the error bars in the 
estimation of the correlation function and neglected the possi- 
ble effect of the bias in the estimator. This is due to the fact that 
they have measured statistical errors by means of the JK method. 
This computes the sample variance by considering diff'erent sam- 
ples which are strongly overlapping. If large scale correlations 
are not negligible, this method underestimates the errors in the 
correlation function. We have shown that a more reliable way 
to compute statistical error bars is given by the simple estima- 
tion of field-to-field fluctuations. However, we have pointed that 
even this method is not able to properly take into account overall 
volume-dependent effects, i.e. the estimator's bias, related to our 
ignorance of the ensemble average density. 

Determinations of correlations throu gh the m e asure- 
ments of the galaxy power spectrum ( Cole et all l2005b 
are affected by similar volume-dependent systematic effects 
(|S^los Labini and Amendola, 1996). In addition one must take 
into account that threshold sampling of a Gaussian field does 
change the shape of power spectrum on large eno u gh sca les, 
i.e. on small enough wave-numbers dPurrer et al.L l2003h . A 
similar situati on should occur in the case of the halo models 
dGabrielU et al ., 2005). 

This situation represents an important challenge for mod- 
els, especially in view of the fact that galaxy distribution does 
not present the negative correlations predicted by models up 
to scales larger than ~ 250 Mpc/h. Our conclusion is that, 
in view of the finite-volume effects, the estimation of correla- 
tions presented here must be intended as a lower limit to the 
real correlations characterizing the large scale distribution of 
galaxies. Future surv eys, like the extended SDSS 111 project 
(ISchlegeletani2009l) . may allow us to study the behavior of the 
galaxy correlation function on scales larger than those consid- 
ered here. To understand how volume-dependent systematic ef- 
fects perturb correlation measurements and to make tests on the 
volume stability of statistical quantities it is necessary to con- 
sider a more comp lete statistical analysis that focuses on condi 
tional fluctuations dSvlos Labini et 
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